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This paper discusses a heuristic solution procedure for a combinatorial 
optimization problem that originates in designing signal constellations 
for modems. 

The design problem is to place m signals in a two-dimensional space 
to minimize the average error rate under specified noise conditions, using 
a maximum-likelihood decoding scheme. Intuitively, it amounts {roughly) 
to spreading the signal points as far apart as possible, according to the 
distance measurement implied by the noise function. 

We show how this problem can be reduced to a discrete one: Given an 
I by n matrix P, and m < I, find an m-row subset M = \i\, • • • , i m ) of 
the rows of P that maximizes 

E max pij, 

i-1 i£M 

and then describe an efficient procedure for finding this maximizing set. 
Experiments indicate that the procedure is a useful tool, both for 
analysis of existing and proposed signal constellations and for finding 
new, near-optimum ones. 

I. THE PHYSICAL PROBLEM 

This paper discusses a heuristic procedure for solving a combinatorial 
optimization problem that arises in designing signal constellations for 
modems. The solution method is also applicable to the covering 
problem; we will discuss this at the end of this section. 

The underlying physical problem is the following: A digital signal 
s is to be sent through a noisy channel. In general, s may take on only 
a finite number, m, of distinct values. (In practice, m will be a power 
of 2.) Since the transmission line is an analog device, any specific s 
value is encoded for transmission by modulating a carrier wave for a 
period of time. For instance, s might take on only the two values and 
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1, in which case the modulation might be to send either of two ampli- 
tudes (amplitude modulation) or to send either of two frequencies 
(frequency modulation) . 

The modulation considered here is more complex: A specific value 
of s will be encoded as 

g(t)[_a sin o) c t + b cos u c Q, 

where o> c is the carrier frequency, g is an appropriate pulse shape, and 
a and b are the amplitudes of the sine and cosine components. 

The signal s is quantized into one of m levels; for each level, there is 
a corresponding a and b, so there are m different (a, b) pairs. 

The received signal, as always, is corrupted by noise, so a sample 
value sent as (a, b) is received as (a 1 , b'). At the receiver, a decoder 
processes this corrupted (a', b') and attempts, according to some 
criterion for minimizing the error rate, to reconstruct the (a, b) which 
was sent originally. 

The design question is: What (a, b) pairs should be chosen to 
minimize the error rate for this decoding process? 

In the version of the problem we solve, the main constraint is that 
a 2 -f- b 2 is bounded for all (a, b) pairs, which implies that peak signal 
power is bounded. A related but more difficult problem requires that 
the average of a 2 + b 2 over the (a, b) pairs is constant; this corresponds 
to a bound on average power. We discuss this problem in Section VII. 

The combinatorial optimization problem is a discrete version of this 
design question. Let us replace the continuum of points that could 
represent (a, b) values (everything inside the circle a 2 + b 2 = 1) by I 
discrete points, spread more or less uniformly throughout the region. 
We call these "allowable" signal values. Since the noise may add to the 
received amplitude, the received signal can in fact be outside this 
circle. Let us define additional n — I discrete points to represent the ad- 
ditional possible received signals that lie outside the circle a 2 + b 2 = 1. 

Now define an I by n matrix P = [ptj\ by 

Pa = probability that, if signal i were sent, it would be 

received as (discrete) point j 

i = 1, • • • , I, 2 - I, • • ' , n. 

Suppose for convenience that the chosen signal values are points 1 to 
m (i.e., rows 1 through m of P). The decoding procedure to be used is 
simply this: If j is the received signal, it is decoded as that i in 1, ■ • • , m 
for which p% is maximum. If 1, ■ ■ ■ , m have equal a priori probabilities, 
this procedure minimizes the error probability. 
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The probability that a particular signal i is decoded correctly is the 
probability that it falls into a column j where it is the largest entry. 
The probability of correct decoding using any particular set of m rows 
of P is thus the sum of the column maximum elements in those m 
rows (divided by m). The problem (at last) is to find those m rows that 
maximize this probability; these will be the signals used. More for- 
mally, find an w?-row subset M = {ii, • ■ ■ , i m \ of the rows of P that 
maximizes 

n 

V M = £ max pi j. 

We call Vm the value of the subset M. 

In the physical problem, m < I < n; as an abstract problem, the 
latter inequality is unnecessary. 

Note that the algorithm we will present is essentially insensitive to 
the characteristics of the matrix P. In practice, this means, for example, 
that any noise characteristics can be treated effectively. This includes 
not only the classical additive white noise, but also phase and ampli- 
tude jitter components. 

As an example of a matrix with quite different characteristics, 
suppose P has entries which are either or 1. A covering problem is 
"Find a minimum set of rows of P such that these rows together con- 
tain a 1 in each column of P." We can use our heuristic procedure to 
find approximate solutions for the covering problem as follows: Find 
a maximum value solution of the original problem, using m rows. If 
the value is less than the number of columns of P, increase m; if it 
equals the number of columns, decrease m. Find a new solution with 
the new m. The smallest value of m for which the value equals the 
number of columns is a minimum cover of P. 

II. A HEURISTIC PROCEDURE 

The process is based on iterative improvement of random initial 
solutions. A random set of m rows is chosen from the I possible rows. 
(In practice, of course, we can also let the procedure try to improve 
on a specific initial set.) We augment the m initial rows by one row 
chosen from the I — m unused rows, giving us m + 1 rows. We then 
compute which of these m + 1 rows contributes the least to the value 
of the set and remove it. (The row removed might well be the row 
added.) We then move to the next row in the unused ones and add it 
to the current m rows. The process terminates when all the I — m 
currently unused rows have been examined without finding a profitable 
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replacement. This defines a local optimum solution. We then iterate 
the entire procedure from a new random start. 

The resulting solutions are "1-opt" in the sense of Reference 1 — that 
is, no exchange of a single pair of rows can improve the solution. 
Although 1-opt procedures are among the weaker heuristics, the 
results are quite acceptable, as we shall see in the next section. 

The process is very fast which counteracts the lower effectiveness 
of 1-opting: a 100 by 100 problem takes about one second on the 
Honeywell 6070 (in FORTRAN A). The run time for dense matrices 
is essentially proportional to tn and independent of m. For sparse 
matrices (the situation that occurs in practice), the run time varies 
only with the number of non-zero elements, which for real problems 
is proportional to tn. 

The process is fast, partly because care is taken to do no extra work. 
In detail, a basic step of the algorithm is as follows (the next section 
contains a numerical example, which can be followed in parallel) : 

Initialization. Suppose without loss of generality that the initial rows 
are 1, • • • , m. Call this set M. Let v(i), i = 1, • • • , m, be the decrease 
in value if row i is removed from M . We will compute v. This is done 
only once per local optimum solution. 

We begin by setting v(-) = 0. Each column j (1 s£ j <[ n) con- 
tributes an increment to exactly one component of v, as follows. Find 
Xj and yj, the largest and second-largest elements among the first m 
elements of column j. Record these, and also the rows in which they 
were found, i x and i v (1 ^ i x , i y ^ m). Now, since Xj is the largest 
element in column j, it determines the contribution that column makes 
to the value of M. But if row i x were removed from M, the contribution 
of column j would be determined by y,, the second largest element. 
Thus, the decrease in value that would result if row i x were removed 
from M is Xj — yj, so we add x 3 - — yj to v{i x ). This process is done for 
each column. 

Phase 1 — Evaluation of a Replacement Row. When the initialization is 
finished, we evaluate replacement rows. Suppose row r (m + 1 ^ r ^ t) 
is the next proposed replacement. We will compute which of the m + 1 
rows in M T = {1, • • • , m, r\ decreases the value of M T least when 
removed. We will do this without examining any of the matrix P 
except for row r itself. 

Let A(i) be the change in v(i) that results if row i is removed from 
M T . By computing A, we do not need to change v unless we are actually 
going to exchange two rows. Initially, let A(i) = 0, i = 1, • • • , m, r. 
(Let y(r) = as well.) The value A(i) will be ^0 for i in M while 
A(r) ^ 0. 
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For each column j, let z = p r] , x = x h and y = y,\ we will do one 
of (i), (ii), or (Hi): 

(i) If 2 ^ y, the new element is smaller than second best; no 

action is necessary. 
(m) If y < z ^ x, z is a new second-best element. The A value for 
the row containing x, A(i x ), must be decreased by z — y, since 
the contribution to v(i x ) from column j is now x — z instead 
of x — y. 
(Hi) If z > x, we have a new largest element in the column. Add 
z — x to A(r) (x is now second largest) and subtract x — y from 
A(i x ), since row i x no longer contains the largest element in 
this column. 

Phase 2 — Determination of Which Row to Remove. We have now 
determined A(i), the change in value that would result if row i were 
removed from M T . Find the minimum among v(l) + A(l), • • • , 
v(m) + A(?n), v(r) + A(r). 

If the minimum occurs at row k 7^ r, let us say, then we must 
exchange rows r and k, update v(-), and update the records of largest 
and second-largest elements for each column. Go to Phase 3. 

If this minimum occurs at r, there is no profit in replacing one of 
1, • • • , m by r. If I — m rows have been consecutively examined 
without profit, we have finished; the set M is the local optimum solu- 
tion. Otherwise, we must set r to r + 1 (wrapping around from I to 
m + 1 if necessary) and go back to the Phase 1 calculation. 

Phase 3 — Updating After Exchange of Two Rows. As in Phase 1, we 
must perform one of (i), (ii), or (Hi) below for each column. The 
element x is the largest in M, y is the second largest, and z is the new 
element from row r. Row k is the row being ejected (1 £ & £ m). 

Case (i) : z £ y, II k 9* i x and k 5* i y , then we are not replacing 
either of the two largest elements in this column, so no updating is 
necessary. Go to the next column. 

If k = i x , we are replacing the largest element with something no 
better than third largest. We replace x by y (and i z by i y ) and find a 
new number two element in M r — call it w. Since y is now largest, we 
subtract w — y from v(i u ) and then let i v = i w . 

If k = i v , we are replacing the number two element with something 
no better than third largest. Again we search for w, the new number 
two, update v(i x ) by subtracting w — y, and update i v . 

Case (ii) : y < z ^ x. If k = i x , we are replacing the largest element 
with a new and smaller largest element. The element z replaces x as 
the largest element, and A(r) is increased by z — y. 
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If k t* i x , z becomes the new number two element, and z — y is 
subtracted from v(i x ) to reflect the smaller difference between first and 
second elements. 

Case (Hi) : x < z. If k = i x , we are replacing the largest element 
with a new largest element. The value A (r) is augmented by x — y (it 
already contains z — x from Phase 1). The element z replaces x. 

If k 7^ i x , we push re down into second place, since z is now the 
largest element, and subtract x — y from v(i x ), since x no longer 
contributes. 

After this update has been done for each column, we copy A(r) into 
v(k) and interchange rows r and k. (In the actual implementation, of 
course, row movement is just pointer manipulation.) Now go back to 
Phase 1. 

This is the end of the algorithm description. The critical part of this 
algorithm is evaluating the contribution of a row without doing any 
of the updating necessary to exchange it, and particularly without 
scanning the matrix to find any column maxima. This latter operation 
need be performed only after we have decided upon an exchange; 
furthermore, it is performed only upon a small set of columns — those 
for which the element of the row being replaced was first or second 
largest and for which the replacement element is smaller than both 
[case (i) in Phase 3, above]]. In practice, this condition holds for about 
10 percent of the columns when we actually do a replacement. Since 
typically we replace relatively few rows in proportion to the number 
examined, the time saving is large. 

III. AN EXAMPLE 

This description may be clarified by one step of an example. Suppose 
m = 3, the cost matrix is 



P = 



and the first three rows are the current set. (This is obviously not a 
probability matrix — small integers are better for exposition.) Then the 
best entries are (2, 5, 8, 7, 4, 6), a value of 32. 

Initialization. We find that «i is 5 (from column 6), v 2 is 7 (from 
columns 1, 3, and 4), and v 3 is 4 (from columns 2 and 5). Thus 
v = (5, 7, 4). Suppose we want to evaluate row 4 as a replacement for 
one of these rows. 
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Phase 1. Considering column 1, 4 > 2 so we are doing case (Hi). The 
value A 4 is augmented by 2; at the same time, A 2 is decreased by 1, 
because the element in row 2 is no longer largest in column 1. 

There is no change in column 2 [case (u)]. In column 3, row 4 
represents a new second-largest element, so A 2 is decreased by 1. There 
is no change in column 4. In column 5, add 4 to A4, and subtract 2 from 
A 3 . In column 6, decrease A x by 1. Thus, A = ( — 1, -2, —2, 6). 

Phase 2. The minimum of v t + A,- is 2, at i = 3, implying that row 
3 should be ejected. We now commence the updating operation. 

Phase 3. For column 1, the largest value is 4 (coincidentally in row 
4) and the second largest is 2 (in row 2). Decrease v 2 by 1, since row 2 
no longer contributes in this column [case (m)]. In column 2, we are 
replacing the largest element by a very small one [case (i)]; the old 
number two element becomes the new largest, and we have to search 
for the new second largest. Add 1 to v h since the largest value is a 3 
and the second largest a 2. In column 3, P4.3 represents a new and 
bigger second element; decrease v 2 by 1. No change takes place in 
column 4. In column 5, we are replacing the largest element by a new 
largest element. The value of v 4 is increased by 2(p 5 ,3 — Pa, 2)- In 
column 6, we gain a new second element, so v x is decreased by 1. 

En route, we compute the new value of the solution as 36. 

When we have finished, v = (5, 5, 8) for rows 1, 2, and 4, and the 
process continues by our considering row 5. Notice that out of six 
columns we only had to search for a second-largest element once; the 
rest of the time, it was immediately at hand. 

IV. EXPERIMENTAL RESULTS 

We have tried the procedure on several different types of data: 
matrix entries random on {0, • • • , n], random 0—1 matrices, and 
various probability matrices based on the physical problem. 

For problems formed by generating random entries in the matrix, 
the fraction of random starts producing the optimum dimLJshes as 
m/l increases (except for the trivial cases of very small or very large 
7)\), and also as n increases. Although there is significant individual 
variation, the frequency of obtaining the optimum is close to 100 
percent for small problems and still about 20 percent for problems 
with m = 10, I = 60, n = 80, the largest ranuom problems tried. (It 
should be noted that "optimum" usually means "best solution seen 
in a large number of trials"; we have strong statistical grounds for 
believing them optimal, but no proof.) 



1152 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1973 

Table I — Typical Statistics for Small Random Problems 
(entries uniform on [0, ?i]) 
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Run time is directly proportional to In; in absolute terms, the run 
time is about 100 In microseconds per random start. 

The procedure almost always makes less than two passes through 
the I — m unused elements; the number of row replacements is roughly 
equal to m. As we mentioned above, it is only on these occasions that 
it is necessary to actually update the records of first- and second-best 
elements, and only for about 10 percent of the columns among the 
replacements is it necessary actually to scan through the m rows to 
locate a new second largest. (After initialization, it is never necessary 
to scan to find the largest.) Table I shows some typical results for these 
smaller tests. 

Limited tests on 0-1 matrices produced similar results, although the 
run time appears to be slightly lower per case. It is possible in the 
0-1 case to make several simplifications that would further decrease 
run time, and storage requirements could be drastically reduced by 
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Fig. 1 — Optimum solution. 
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using bit storage. However, we have not experimented extensively on 
0-1 matrices. This will be reported in a separate paper. 

Several experiments have been performed on various models of the 
real problem. Since the modems involved have very low error rates, 
the error transition probability p,y, i 5^ j, is small. This means that the 
matrix P has large diagonal elements, a few small elements, and a 
large number of zeros. For our purposes, the existence of large values 
is irrelevant; however, many zeros means that a storage organization 
that does not store zeros is attractive. We will discuss this shortly. 

One problem to be faced is how to represent the continuous (a, b) 
space as a set of discrete points. One crude model we studied uses a 
"honeycomb" or hexagonal scheme (shown in Fig. 1), since this is a 
reasonable approximation to circular symmetry. The inner 61 points 
represent allowable signal locations; the outer 30 are the extra points 
to take care of the set of received signals that violate the peak power 
constraint. 

For this test, P is defined as 

Pa = e if y is a neighbor of i (e « 1) 

= 1 — (« X number of neighbors) if i = j 
= otherwise. 

This set of probabilities ignores phase jitter, which adds a radially 
increasing tangential component to the error transitions. 

Figure 1 shows an optimum solution (it is easy to prove it optimum) ; 
Fig. 2 shows a local optimum differing by e. In both cases, 12 signals 
are placed symmetrically on the boundary, and the remaining I our are 
placed as well as possible in the interior. For this problem, all solutions 
were within 9e of optimum and the median within 3e (the mean ran- 
dom start is about 35c away), run times averaged 370 ms per case, and 
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Fig. 2 — Suboptimal by one unit. 
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the frequency of optimum solutions was about 10 percent. This problem 
appears to be slightly harder than random problems of this size, but 
run times are smaller. 

V. LARGER PROBLEMS 

To properly handle larger problems (e.g., to increase the resolution 
available when discretizing) a version of the program using a sparse 
matrix representation has been implemented. This involves substantial 
overhead in accessing elements of the matrix, but it is balanced by the 
fact that, when most matrix elements are zero, much less processing is 
required. For example, in the 16/61/91 (i.e., m/t/ri) problem above, 
average run time increased from 370 to 390 ms, which is not significant. 
The run time is determined predominantly by the number of non-zero 
points, which is usually proportional to In. 



Fig. 3 — "5-11" solution in pure Gaussian noise; /3 = 0.0; error rate = 1.07 X 10 -7 . 
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Table II — Some Results on Design Problems 
(m = 16, N = 0.002) 

Phase 

Run Jitter Character 

Non-zero Time of Best 

/ n Entries (s) (degrees) Solution Figure 

1. 

2. 
3. 
4. 

5. 
G. 



VI. RESULTS ON REAL PROBLEMS 

In this section we discuss some of the experimental results obtained 
from real problems, using formulas for the transition probabilities 
taken from Reference 2. The probability for the transition from 
X = (xi, Xt) to Y = ((/i, y 2 ) has the general form 

p(X, Y) = f(X, Y; (3, N„) exp \jg(X, F; /3, iV )], 

where N is the noise power of the channel and /3 is the rms phase jitter 
in the received signal. The details of / and g do not concern us here; it 
is sufficient to say that p (X, Y) drops to zero rapidly as Y gets further 
from X. For example, in the pure Gaussian noise case (/3 = 0), we have 

V(X Y)~ 1 exJ " F - X 'H 
p{X > Y) ~ 2*N exp L 2N J 

Thus, the probability matrix is quite sparse when the transition 
probabilities have been scaled and converted to integers. 

The discrete space consists of points on a square lattice, as shown in 
Fig. 3. The number of rows in P is determined by the number of lattice 
points within the circle of radius 1. To these are added exterior points 
approximating all possible additional received points. Since the radial 
probabilities drop off rapidly, this exterior layer need only be one or 
two units thick. This extra layer is indicated by the band of omitted 
points on Fig. 3. 

The run time and the storage requirements both grow with the 
number of lattice points; this limits the resolution we can use. The 
largest problem tried had 665 rows, 861 columns, and about 40,000 
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Fig. 4 — "1-5-10" solution for 1.5-degree jitter (error rate = 2.97 X 10~ 7 ). 



non-zero matrix entries. It should be noted that the matrix has a 
fourfold symmetry, so, at the price of an increase in computing time 
(in practice, about 50 percent), only 10,000 entires need be stored. 

Two types of experiments were performed. First, several constella- 
tions of intrinsic interest were used as initial solutions; the heuristic 
procedure attempted to improve upon them. Second, the procedure 
was used to produce good solutions from a large number of random 
initial configurations. For all solutions, approximate error rates were 
computed and the constellations displayed. 

Table II lists some typical parameters for several experiments at 
various sizes; Figs. 3 to 5 show the best solutions found for particular 
parameter settings. 

Each signal point is surrounded by a set of points which, when 
received, will decode into that signal point. This set of points is the 
"decision region" for that signal point. Because we have quantized a 
continuous space into small squares, the decision regions surrounding 
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Fig. 5— "1-6-9" solution for 3-degree jitter (error rate - 3.26 X 10~ 8 ). 



each signal point have "ragged edges" and are of necessity somewhat 
arbitrary. As the resolution is made finer, this effect is less serious, and 
in fact the procedure can make more subtle choices of points and of 
boundaries, so the apparent error rate decreases with increasing 
resolution. For this reason, error rate comparisons between different 
resolutions are not appropriate. However, the rates are internally 
consistent in that, for any given resolution, the solution character and 
error rates vary with noise as would be expected. 

As predicted by analytic techniques, 2 solutions like "1-5-10" and 
"1-6-9" are better for high jitter (0 > 1.5°), while "5-11" solutions 
are better in low jitter cases. (The notation will be evident after 
examining the figures.) These trends are clearly indicated in Figs. 3 
through 5, which show, respectively, the best solution (a 5-11) for zero 
phase-jitter (pure Gaussian noise) with an error rate of 1 X 10~ 7 , the 
best solution (1-5-10) for 1.5 degrees of phase jitter (error rate 
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Fig. 6 — Competing design, for 3-degree jitter (error rate = 3.47 X 10~ 6 ). 



3 X 10 -7 )> and the best solution (1-6-9) for 3 degrees of jitter (error 
rate 3 X lO" 6 ). 

For comparison, we experimented with the competing design shown 
in Fig. 6, at various levels of jitter. This design is intended to be robust 
over a wide range of jitter. This configuration does degrade less than 
the others as jitter increases, but its overall performance is very much 
inferior. Figure 6 shows that, at 3 degrees, its error rate is 3.5 X 10 -5 , a 
factor of 10 worse than the 1-6-9 configuration. These results agree 
closely with predictions of independent theoretical studies. 2 



VII. CONCLUSIONS 



As a solution to a combinatorial optimization problem, the heuristic 
procedure presented here is quite good for small-to-medium problems, 
say up to about 100 rows, even for dense matrices. It remains useful, 
but not strong, for large sparse problems. 
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As for the original design problem, the number of rows and columns 
in the matrix both rise with the resolution; thus, highly accurate 
representations are computationally expensive, so the procedure is 
generally not appropriate for generating precise answers to specific 
design questions. Rather, it is most useful in providing quick approxi- 
mate and comparative optimizations or evaluations, either to furnish 
insight or to supplement results obtained by analytic techniques. 

The extension of this technique to problems with an average power 
constraint, rather than peak power, appears to be straightforward, 
although we have not implemented it. [The average power constraint 
requires that £(a* -f &*) £ 1.] 

As the simplest solution, start with a random feasible set of m rows. 
Then, before each possible replace row is selected, test to see if it would 
violate feasibility; if so, it cannot be used. A more powerful algorithm 
would permit temporary violations of feasibility in a controlled way. 
Either of these approaches should serve reasonably well. 
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